28 research outputs found
Tail bounds for all eigenvalues of a sum of random matrices
This work introduces the minimax Laplace transform method, a modification of
the cumulant-based matrix Laplace transform method developed in "User-friendly
tail bounds for sums of random matrices" (arXiv:1004.4389v6) that yields both
upper and lower bounds on each eigenvalue of a sum of random self-adjoint
matrices. This machinery is used to derive eigenvalue analogues of the
classical Chernoff, Bennett, and Bernstein bounds.
Two examples demonstrate the efficacy of the minimax Laplace transform. The
first concerns the effects of column sparsification on the spectrum of a matrix
with orthonormal rows. Here, the behavior of the singular values can be
described in terms of coherence-like quantities. The second example addresses
the question of relative accuracy in the estimation of eigenvalues of the
covariance matrix of a random process. Standard results on the convergence of
sample covariance matrices provide bounds on the number of samples needed to
obtain relative accuracy in the spectral norm, but these results only guarantee
relative accuracy in the estimate of the maximum eigenvalue. The minimax
Laplace transform argument establishes that if the lowest eigenvalues decay
sufficiently fast, on the order of (K^2*r*log(p))/eps^2 samples, where K is the
condition number of an optimal rank-r approximation to C, are sufficient to
ensure that the dominant r eigenvalues of the covariance matrix of a N(0, C)
random vector are estimated to within a factor of 1+-eps with high probability.Comment: 20 pages, 1 figure, see also arXiv:1004.4389v
Revisiting the Nystrom Method for Improved Large-Scale Machine Learning
We reconsider randomized algorithms for the low-rank approximation of
symmetric positive semi-definite (SPSD) matrices such as Laplacian and kernel
matrices that arise in data analysis and machine learning applications. Our
main results consist of an empirical evaluation of the performance quality and
running time of sampling and projection methods on a diverse suite of SPSD
matrices. Our results highlight complementary aspects of sampling versus
projection methods; they characterize the effects of common data preprocessing
steps on the performance of these algorithms; and they point to important
differences between uniform sampling and nonuniform sampling methods based on
leverage scores. In addition, our empirical results illustrate that existing
theory is so weak that it does not provide even a qualitative guide to
practice. Thus, we complement our empirical results with a suite of worst-case
theoretical bounds for both random sampling and random projection methods.
These bounds are qualitatively superior to existing bounds---e.g. improved
additive-error bounds for spectral and Frobenius norm error and relative-error
bounds for trace norm error---and they point to future directions to make these
algorithms useful in even larger-scale machine learning applications.Comment: 60 pages, 15 color figures; updated proof of Frobenius norm bounds,
added comparison to projection-based low-rank approximations, and an analysis
of the power method applied to SPSD sketche
The Masked Sample Covariance Estimator: An Analysis via Matrix Concentration Inequalities
Covariance estimation becomes challenging in the regime where the number p of
variables outstrips the number n of samples available to construct the
estimate. One way to circumvent this problem is to assume that the covariance
matrix is nearly sparse and to focus on estimating only the significant
entries. To analyze this approach, Levina and Vershynin (2011) introduce a
formalism called masked covariance estimation, where each entry of the sample
covariance estimator is reweighted to reflect an a priori assessment of its
importance. This paper provides a short analysis of the masked sample
covariance estimator by means of a matrix concentration inequality. The main
result applies to general distributions with at least four moments. Specialized
to the case of a Gaussian distribution, the theory offers qualitative
improvements over earlier work. For example, the new results show that n = O(B
log^2 p) samples suffice to estimate a banded covariance matrix with bandwidth
B up to a relative spectral-norm error, in contrast to the sample complexity n
= O(B log^5 p) obtained by Levina and Vershynin
Compact Random Feature Maps
Kernel approximation using randomized feature maps has recently gained a lot
of interest. In this work, we identify that previous approaches for polynomial
kernel approximation create maps that are rank deficient, and therefore do not
utilize the capacity of the projected feature space effectively. To address
this challenge, we propose compact random feature maps (CRAFTMaps) to
approximate polynomial kernels more concisely and accurately. We prove the
error bounds of CRAFTMaps demonstrating their superior kernel reconstruction
performance compared to the previous approximation schemes. We show how
structured random matrices can be used to efficiently generate CRAFTMaps, and
present a single-pass algorithm using CRAFTMaps to learn non-linear multi-class
classifiers. We present experiments on multiple standard data-sets with
performance competitive with state-of-the-art results.Comment: 9 page